MCN: Modulated Convolutional Network

45

Algorithm 1 MCN training. L is the loss function, Q is the reconstructed filter, λ1 and λ2

are decay factors, and N is the number of layers. Update() updates the parameters based

on our update scheme.

Input: a minibatch of inputs and their labels, unbinarized filters C, modulation filters M,

learning rates η1 and η2, corresponding to C and M, respectively.

Output: updated unbinarized filters Ct+1, updated modulation filters M t+1, and updated

learning rates ηt+1

1

and ηt+1

2

.

1: {1. Computing gradients with aspect to the parameters:}

2: {1.1. Forward propagation:}

3: for k =1 to N do

4:

ˆCBinarize(C)

5:

Computing Q via Eq. 3.133.14

6:

Convolutional features calculation using Eq. 3.153.17

7: end for

8: {1.2. Backward propagation:}

9: {Note that the gradients are not binary.}

10: Computing δQ = ∂L

∂Q

11: for k =N to 1 do

12:

Computing δ ˆ

C using Eq. 3.20, Eq. 3.223.23

13:

Computing δM using Eq. 3.24, Eq. 3.263.27

14: end for

15: {Accumulating the parameters gradients:}

16: for k = 1 to N do

17:

Ct+1 Update(δ ˆ

C, η1) (using Eq. 3.21)

18:

M t+1 Update(δM, η2) (using Eq. 3.25)

19:

ηt+1

1

λ1η1

20:

ηt+1

2

λ2η2

21: end for

where η2 is the learning rate. Furthermore, we have the following.

∂LS

∂M = ∂LS

∂Q · ∂Q

∂M =



i,j

∂LS

∂Qij

· ˆCi,

(3.26)

Based on Eq. 3.18 and we have:

∂LM

∂M =θ



i,j

(CiˆCiMj) · ˆCi.

(3.27)

Details about the derivatives concerning center loss can be found in [245]. These deriva-

tions show that MCNs can be learned with the BP algorithm. The quantization process leads

to a new loss function via a simple projection function, which never affects the convergence

of MCNs. We describe our algorithm in Algorithm 1.

3.4.4

Parameters Evaluation

θ and λ: There are θ and λ in Eq. 3.18, which are related to the filter loss and center loss.

The effect of parameters θ and λ is evaluated in CIFAR-10 for a 20-layer MCN with width

16-16-32-64, the architecture detail of which can be found in [281] and is also shown in

Fig. 3.6. The Adadelta optimization algorithm [282] is used during the training process,